{
 "cells": [
  {
   "cell_type": "markdown",
   "id": "6453d3d5",
   "metadata": {},
   "source": [
    "<a href=\"https://colab.research.google.com/github/run-llama/llama_index/blob/main/docs/examples/llm/anthropic.ipynb\" target=\"_parent\"><img src=\"https://colab.research.google.com/assets/colab-badge.svg\" alt=\"Open In Colab\"/></a>"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "72ed6f61-28a7-4f90-8a45-e3f452f95dbd",
   "metadata": {},
   "source": [
    "# Anthropic\n",
    "\n",
    "Anthropic offers many state-of-the-art models from the haiku, sonnet, and opus families.\n",
    "\n",
    "Read on to learn how to use these models with LlamaIndex!"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "c78b172f",
   "metadata": {},
   "source": [
    "If you're opening this Notebook on colab, you will probably need to install LlamaIndex 🦙."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "8e31874a",
   "metadata": {},
   "outputs": [],
   "source": [
    "%pip install llama-index-llms-anthropic"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "3cbf8694-ad53-459a-84c1-64de2dadeaf5",
   "metadata": {},
   "source": [
    "#### Set Tokenizer\n",
    "\n",
    "First we want to set the tokenizer, which is slightly different than TikToken. This ensures that token counting is accurate throughout the library.\n",
    "\n",
    "**NOTE**: Anthropic recently updated their token counting API. Older models like claude-2.1 are no longer supported for token counting in the latest versions of the Anthropic python client."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "c6ac37cb-b588-44c7-8fd9-8eab454900a5",
   "metadata": {},
   "outputs": [],
   "source": [
    "from llama_index.llms.anthropic import Anthropic\n",
    "from llama_index.core import Settings\n",
    "\n",
    "tokenizer = Anthropic().tokenizer\n",
    "Settings.tokenizer = tokenizer"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "b81a3ef6-2ee5-460d-9aa4-f73708774014",
   "metadata": {},
   "source": [
    "## Basic Usage"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "85fbba23",
   "metadata": {},
   "outputs": [],
   "source": [
    "import os\n",
    "\n",
    "os.environ[\"ANTHROPIC_API_KEY\"] = \"sk-...\""
   ]
  },
  {
   "cell_type": "markdown",
   "id": "712ea8f4",
   "metadata": {},
   "source": [
    "You can call `complete` with a prompt:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "910b50ad-c55e-487e-8808-5905dfaa78b4",
   "metadata": {},
   "outputs": [],
   "source": [
    "from llama_index.llms.anthropic import Anthropic\n",
    "\n",
    "# To customize your API key, do this\n",
    "# otherwise it will lookup ANTHROPIC_API_KEY from your env variable\n",
    "# llm = Anthropic(api_key=\"<api_key>\")\n",
    "llm = Anthropic(model=\"claude-sonnet-4-0\")\n",
    "\n",
    "resp = llm.complete(\"Who is Paul Graham?\")"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "dfda925e-89c5-47a6-9311-16916ab08b66",
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "Paul Graham is a computer programmer, entrepreneur, venture capitalist, and essayist. Here are the key things he's known for:\n",
      "\n",
      "**Y Combinator**: He co-founded this highly influential startup accelerator in 2005, which has helped launch companies like Airbnb, Dropbox, Stripe, and Reddit. Y Combinator provides seed funding and mentorship to early-stage startups.\n",
      "\n",
      "**Programming**: He's a respected figure in the programming community, particularly known for his work with Lisp programming language and for co-creating the first web-based application, Viaweb, in the 1990s (which was sold to Yahoo and became Yahoo Store).\n",
      "\n",
      "**Writing**: Graham is well-known for his thoughtful essays on startups, technology, programming, and society, published on his website paulgraham.com. His essays are widely read in tech circles and cover topics like how to start a startup, the nature of innovation, and social commentary.\n",
      "\n",
      "**Books**: He's authored several books including \"Hackers & Painters\" and \"On Lisp.\"\n",
      "\n",
      "**Influence**: He's considered one of the most influential people in Silicon Valley's startup ecosystem, both through Y Combinator's impact and his writings on entrepreneurship and technology.\n",
      "\n",
      "Graham is known for his analytical thinking and contrarian perspectives on business, technology, and culture.\n"
     ]
    }
   ],
   "source": [
    "print(resp)"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "09c27c58",
   "metadata": {},
   "source": [
    "You can also call `chat` with a list of chat messages:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "7a79dd31",
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "assistant: Ahoy there, matey! *adjusts tricorn hat and strokes beard* \n",
      "\n",
      "Let me spin ye a tale from me seafarin' days, when the ocean was as wild as a kraken's temper and twice as unpredictable!\n",
      "\n",
      "**The Tale of the Singing Compass**\n",
      "\n",
      "'Twas a foggy mornin' when me crew and I discovered the strangest treasure - not gold or jewels, mind ye, but a compass that hummed sea shanties! Aye, ye heard right! This peculiar little instrument would warble different tunes dependin' on which direction it pointed.\n",
      "\n",
      "North brought forth a melancholy ballad about lost loves, while South sang a jaunty tune that made even our grumpiest sailor, One-Eyed Pete, tap his peg leg. But here's the curious part - when it pointed West, it sang a mysterious melody none of us had ever heard, with words in an ancient tongue.\n",
      "\n",
      "Bein' the adventurous sort (and perhaps a wee bit foolish), we followed that western song for three days and three nights. The compass led us through treacherous waters, past islands that seemed to shimmer like mirages, until we reached a hidden cove where the water glowed like liquid emeralds.\n",
      "\n",
      "And there, me hearty friend, we found the greatest treasure of all - not riches, but a family of merfolk who had been waitin' centuries for someone to return their enchanted compass! They rewarded our kindness with safe passage through any storm and the secret locations of three genuine treasure islands.\n",
      "\n",
      "*winks and takes a swig from an imaginary bottle*\n",
      "\n",
      "Sometimes the best adventures come from followin' the strangest songs, savvy?\n"
     ]
    }
   ],
   "source": [
    "from llama_index.core.llms import ChatMessage\n",
    "from llama_index.llms.anthropic import Anthropic\n",
    "\n",
    "messages = [\n",
    "    ChatMessage(\n",
    "        role=\"system\", content=\"You are a pirate with a colorful personality\"\n",
    "    ),\n",
    "    ChatMessage(role=\"user\", content=\"Tell me a story\"),\n",
    "]\n",
    "llm = Anthropic(model=\"claude-sonnet-4-0\")\n",
    "resp = llm.chat(messages)\n",
    "\n",
    "print(resp)"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "ee87da7e",
   "metadata": {},
   "source": [
    "## Streaming Support\n",
    "\n",
    "Every method supports streaming through the `stream_` prefix."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "e49a2681",
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "Paul Graham is a computer programmer, entrepreneur, venture capitalist, and essayist. Here are the key things he's known for:\n",
      "\n",
      "**Y Combinator Co-founder**: He co-founded Y Combinator in 2005, one of the most successful startup accelerators in the world. Y Combinator has funded companies like Airbnb, Dropbox, Stripe, Reddit, and hundreds of others.\n",
      "\n",
      "**Programming and Lisp**: He's a strong advocate for the Lisp programming language and wrote influential books including \"On Lisp\" and \"ANSI Common Lisp.\"\n",
      "\n",
      "**Viaweb**: In the 1990s, he co-founded Viaweb, one of the first web-based software companies, which was acquired by Yahoo in 1998 and became Yahoo Store.\n",
      "\n",
      "**Essays**: He's written many influential essays on startups, programming, and technology, published on his website paulgraham.com. His essays are widely read in the tech community and cover topics like how to start a startup, what makes a good programmer, and the nature of innovation.\n",
      "\n",
      "**Art and Academia**: He has a PhD in Computer Science from Harvard and also studied painting at the Rhode Island School of Design and the Accademia di Belle Arti in Florence.\n",
      "\n",
      "Graham is considered one of the most influential figures in the startup ecosystem and has helped shape modern thinking about entrepreneurship and technology startups."
     ]
    }
   ],
   "source": [
    "from llama_index.llms.anthropic import Anthropic\n",
    "\n",
    "llm = Anthropic(model=\"claude-sonnet-4-0\")\n",
    "\n",
    "resp = llm.stream_complete(\"Who is Paul Graham?\")\n",
    "for r in resp:\n",
    "    print(r.delta, end=\"\")"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "2cffe6ec",
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "Paul Graham is a computer programmer, entrepreneur, venture capitalist, and essayist. Here are the key things he's known for:\n",
      "\n",
      "**Y Combinator**: He co-founded this highly influential startup accelerator in 2005, which has helped launch companies like Airbnb, Dropbox, Stripe, and Reddit. Y Combinator provides seed funding and mentorship to early-stage startups.\n",
      "\n",
      "**Programming**: He's a respected figure in the programming community, particularly known for his work with Lisp programming language and for co-creating the first web-based application, Viaweb, in the 1990s (which was sold to Yahoo and became Yahoo Store).\n",
      "\n",
      "**Writing**: Graham is well-known for his thoughtful essays on startups, technology, programming, and entrepreneurship, published on his website paulgraham.com. His essays are widely read in tech circles and cover topics like how to start a startup, the nature of innovation, and technology trends.\n",
      "\n",
      "**Influence**: He's considered one of the most influential people in Silicon Valley's startup ecosystem, both through Y Combinator's success and his writings that have shaped how many people think about entrepreneurship and technology.\n",
      "\n",
      "His combination of technical expertise, business acumen, and clear writing has made him a prominent voice in the tech industry for over two decades."
     ]
    }
   ],
   "source": [
    "from llama_index.core.llms import ChatMessage\n",
    "\n",
    "messages = [\n",
    "    ChatMessage(role=\"user\", content=\"Who is Paul Graham?\"),\n",
    "]\n",
    "\n",
    "resp = llm.stream_chat(messages)\n",
    "for r in resp:\n",
    "    print(r.delta, end=\"\")"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "0624c7bd",
   "metadata": {},
   "source": [
    "## Async Usage\n",
    "\n",
    "Every synchronous method has an async counterpart."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "94f72965",
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "Paul Graham is a computer programmer, entrepreneur, venture capitalist, and essayist. Here are the key things he's known for:\n",
      "\n",
      "**Y Combinator**: He co-founded this highly influential startup accelerator in 2005, which has helped launch companies like Airbnb, Dropbox, Stripe, and Reddit. Y Combinator provides seed funding and mentorship to early-stage startups.\n",
      "\n",
      "**Programming**: He's a respected figure in the programming community, particularly known for his work with Lisp programming language. He wrote influential books like \"On Lisp\" and \"ANSI Common Lisp.\"\n",
      "\n",
      "**Essays**: Graham writes widely-read essays on startups, technology, programming, and society, published on his website paulgraham.com. His essays like \"Do Things That Don't Scale\" and \"How to Start a Startup\" are considered essential reading in the tech world.\n",
      "\n",
      "**Entrepreneur**: Before Y Combinator, he co-founded Viaweb (one of the first web-based applications for building online stores), which was acquired by Yahoo in 1998 for about $49 million and became Yahoo Store.\n",
      "\n",
      "**Art background**: Interestingly, he also has a background in art and studied painting, which influences his perspective on creativity and aesthetics in technology.\n",
      "\n",
      "Graham is considered one of the most influential voices in Silicon Valley and the broader startup ecosystem."
     ]
    }
   ],
   "source": [
    "from llama_index.llms.anthropic import Anthropic\n",
    "\n",
    "llm = Anthropic(model=\"claude-sonnet-4-0\")\n",
    "\n",
    "resp = await llm.astream_complete(\"Who is Paul Graham?\")\n",
    "async for r in resp:\n",
    "    print(r.delta, end=\"\")"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "4b04cfd0",
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "assistant: Paul Graham is a computer programmer, entrepreneur, venture capitalist, and essayist. Here are the key things he's known for:\n",
      "\n",
      "**Y Combinator**: He co-founded this highly influential startup accelerator in 2005, which has helped launch companies like Airbnb, Dropbox, Stripe, and Reddit. Y Combinator provides seed funding and mentorship to early-stage startups.\n",
      "\n",
      "**Programming**: He's a respected figure in the programming community, particularly known for his work with Lisp programming language and for co-creating the first web-based application, Viaweb, in the 1990s (which was sold to Yahoo and became Yahoo Store).\n",
      "\n",
      "**Writing**: Graham is well-known for his thoughtful essays on startups, technology, programming, and society, published on his website paulgraham.com. His essays are widely read in tech circles and cover topics like how to start a startup, the nature of innovation, and social commentary.\n",
      "\n",
      "**Books**: He's authored several books including \"Hackers & Painters\" and \"On Lisp.\"\n",
      "\n",
      "**Influence**: He's considered one of the most influential people in Silicon Valley's startup ecosystem, both through Y Combinator's impact and his writings on entrepreneurship and technology.\n",
      "\n",
      "Graham is known for his analytical thinking and contrarian perspectives on business, technology, and culture.\n"
     ]
    }
   ],
   "source": [
    "messages = [\n",
    "    ChatMessage(role=\"user\", content=\"Who is Paul Graham?\"),\n",
    "]\n",
    "\n",
    "resp = await llm.achat(messages)\n",
    "print(resp)"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "f25ccf92",
   "metadata": {},
   "source": [
    "## Vertex AI Support\n",
    "\n",
    "By providing the `region` and `project_id` parameters (either through environment variables or directly), you can use an Anthropic model through Vertex AI."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "c4a9db35",
   "metadata": {},
   "outputs": [],
   "source": [
    "import os\n",
    "\n",
    "os.environ[\"ANTHROPIC_PROJECT_ID\"] = \"YOUR PROJECT ID HERE\"\n",
    "os.environ[\"ANTHROPIC_REGION\"] = \"YOUR PROJECT REGION HERE\""
   ]
  },
  {
   "cell_type": "markdown",
   "id": "ed545c13",
   "metadata": {},
   "source": [
    "Do keep in mind that setting region and project_id here will make Anthropic use the Vertex AI client"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "cf8b6f49",
   "metadata": {},
   "source": [
    "## Bedrock Support\n",
    "\n",
    "LlamaIndex also supports Anthropic models through AWS Bedrock."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "8df734a4",
   "metadata": {},
   "outputs": [],
   "source": [
    "from llama_index.llms.anthropic import Anthropic\n",
    "\n",
    "# Note: this assumes you have standard AWS credentials configured in your environment\n",
    "llm = Anthropic(\n",
    "    model=\"anthropic.claude-3-7-sonnet-20250219-v1:0\",\n",
    "    aws_region=\"us-east-1\",\n",
    ")\n",
    "\n",
    "resp = llm.complete(\"Who is Paul Graham?\")"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "37a35c7e",
   "metadata": {},
   "source": [
    "## Multi-Modal Support\n",
    "\n",
    "Using `ChatMessage` objects, you can pass in images and text to the LLM."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "36874ac9",
   "metadata": {},
   "outputs": [],
   "source": [
    "!wget https://cdn.pixabay.com/photo/2021/12/12/20/00/play-6865967_640.jpg -O image.jpg"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "152e2e9f",
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "assistant: This image shows four wooden dice on a dark fabric surface. The dice appear to be made of light-colored wood and have the traditional black dots (pips) marking the numbers on each face. They are scattered casually on what looks like a dark blue or black cloth background.\n"
     ]
    }
   ],
   "source": [
    "from llama_index.core.llms import ChatMessage, TextBlock, ImageBlock\n",
    "from llama_index.llms.anthropic import Anthropic\n",
    "\n",
    "llm = Anthropic(model=\"claude-sonnet-4-0\")\n",
    "\n",
    "messages = [\n",
    "    ChatMessage(\n",
    "        role=\"user\",\n",
    "        blocks=[\n",
    "            ImageBlock(path=\"image.jpg\"),\n",
    "            TextBlock(text=\"What is in this image?\"),\n",
    "        ],\n",
    "    )\n",
    "]\n",
    "\n",
    "resp = llm.chat(messages)\n",
    "print(resp)"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "03aed884",
   "metadata": {},
   "source": [
    "## Prompt Caching\n",
    "\n",
    "Anthropic models support the idea of prompt cahcing -- wherein if a prompt is repeated multiple times, or the start of a prompt is repeated, the LLM can reuse pre-calculated attention results to speed up the response and lower costs.\n",
    "\n",
    "To enable prompt caching, you can set `cache_control` on your `ChatMessage` objects, or set `cache_idx` on the LLM to always cache the first X messages (with -1 being all messages)."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "d1027338",
   "metadata": {},
   "outputs": [],
   "source": [
    "from llama_index.core.llms import ChatMessage\n",
    "from llama_index.llms.anthropic import Anthropic\n",
    "\n",
    "llm = Anthropic(model=\"claude-sonnet-4-0\")\n",
    "\n",
    "# cache individual message(s)\n",
    "messages = [\n",
    "    ChatMessage(\n",
    "        role=\"user\",\n",
    "        content=\"<some very long prompt>\",\n",
    "        additional_kwargs={\"cache_control\": {\"type\": \"ephemeral\"}},\n",
    "    ),\n",
    "]\n",
    "\n",
    "resp = llm.chat(messages)\n",
    "\n",
    "# cache first X messages (with -1 being all messages)\n",
    "llm = Anthropic(model=\"claude-sonnet-4-0\", cache_idx=-1)\n",
    "\n",
    "resp = llm.chat(messages)"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "e33bfa1f-589e-475a-8eb3-fa37d95759a7",
   "metadata": {},
   "source": [
    "## Structured Prediction\n",
    "\n",
    "LlamaIndex provides an intuitive interface for converting any Anthropic LLMs into a structured LLM through `structured_predict` - simply define the target Pydantic class (can be nested), and given a prompt, we extract out the desired object."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "d8b622d1-0c91-4cde-ab4c-3f83e0127a4b",
   "metadata": {},
   "outputs": [],
   "source": [
    "from llama_index.llms.anthropic import Anthropic\n",
    "from llama_index.core.prompts import PromptTemplate\n",
    "from llama_index.core.bridge.pydantic import BaseModel\n",
    "from typing import List\n",
    "\n",
    "\n",
    "class MenuItem(BaseModel):\n",
    "    \"\"\"A menu item in a restaurant.\"\"\"\n",
    "\n",
    "    course_name: str\n",
    "    is_vegetarian: bool\n",
    "\n",
    "\n",
    "class Restaurant(BaseModel):\n",
    "    \"\"\"A restaurant with name, city, and cuisine.\"\"\"\n",
    "\n",
    "    name: str\n",
    "    city: str\n",
    "    cuisine: str\n",
    "    menu_items: List[MenuItem]\n",
    "\n",
    "\n",
    "llm = Anthropic(model=\"claude-sonnet-4-0\")\n",
    "prompt_tmpl = PromptTemplate(\n",
    "    \"Generate a restaurant in a given city {city_name}\"\n",
    ")\n",
    "\n",
    "# Option 1: Use `as_structured_llm`\n",
    "restaurant_obj = (\n",
    "    llm.as_structured_llm(Restaurant)\n",
    "    .complete(prompt_tmpl.format(city_name=\"Miami\"))\n",
    "    .raw\n",
    ")\n",
    "# Option 2: Use `structured_predict`\n",
    "# restaurant_obj = llm.structured_predict(Restaurant, prompt_tmpl, city_name=\"Miami\")"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "ec769384-d3fe-4761-99e3-bdddcb9c7e4d",
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "Restaurant(name='Ocean Breeze Bistro', city='Miami', cuisine='Seafood', menu_items=[MenuItem(course_name='Grilled Mahi-Mahi with Mango Salsa', is_vegetarian=False), MenuItem(course_name='Coconut Shrimp with Pineapple Dipping Sauce', is_vegetarian=False), MenuItem(course_name='Quinoa and Black Bean Bowl', is_vegetarian=True), MenuItem(course_name='Key Lime Pie', is_vegetarian=True), MenuItem(course_name='Lobster Bisque', is_vegetarian=False), MenuItem(course_name='Grilled Vegetable Platter with Chimichurri', is_vegetarian=True)])"
      ]
     },
     "execution_count": null,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "restaurant_obj"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "c63fb270-3d7d-4b9c-9cda-aaa42004c7dd",
   "metadata": {},
   "source": [
    "#### Structured Prediction with Streaming\n",
    "\n",
    "Any LLM wrapped with `as_structured_llm` supports streaming through `stream_chat`."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "e6e91261-e4ef-4706-8325-65d782f9a87b",
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "{'city': 'San Francisco',\n",
      " 'cuisine': 'California Fusion',\n",
      " 'menu_items': [{'course_name': 'Dungeness Crab Cakes', 'is_vegetarian': False},\n",
      "                {'course_name': 'Roasted Beet and Arugula Salad',\n",
      "                 'is_vegetarian': True},\n",
      "                {'course_name': 'Grilled Pacific Salmon',\n",
      "                 'is_vegetarian': False},\n",
      "                {'course_name': 'Wild Mushroom Risotto', 'is_vegetarian': True},\n",
      "                {'course_name': 'Grass-Fed Beef Tenderloin',\n",
      "                 'is_vegetarian': False},\n",
      "                {'course_name': 'Chocolate Lava Cake', 'is_vegetarian': True}],\n",
      " 'name': 'Golden Gate Bistro'}\n"
     ]
    },
    {
     "data": {
      "text/plain": [
       "Restaurant(name='Golden Gate Bistro', city='San Francisco', cuisine='California Fusion', menu_items=[MenuItem(course_name='Dungeness Crab Cakes', is_vegetarian=False), MenuItem(course_name='Roasted Beet and Arugula Salad', is_vegetarian=True), MenuItem(course_name='Grilled Pacific Salmon', is_vegetarian=False), MenuItem(course_name='Wild Mushroom Risotto', is_vegetarian=True), MenuItem(course_name='Grass-Fed Beef Tenderloin', is_vegetarian=False), MenuItem(course_name='Chocolate Lava Cake', is_vegetarian=True)])"
      ]
     },
     "execution_count": null,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "from llama_index.core.llms import ChatMessage\n",
    "from IPython.display import clear_output\n",
    "from pprint import pprint\n",
    "\n",
    "input_msg = ChatMessage.from_str(\"Generate a restaurant in San Francisco\")\n",
    "\n",
    "sllm = llm.as_structured_llm(Restaurant)\n",
    "stream_output = sllm.stream_chat([input_msg])\n",
    "for partial_output in stream_output:\n",
    "    clear_output(wait=True)\n",
    "    pprint(partial_output.raw.dict())\n",
    "    restaurant_obj = partial_output.raw\n",
    "\n",
    "restaurant_obj"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "dc778794",
   "metadata": {},
   "source": [
    "## Model Thinking\n",
    "\n",
    "With `claude-3.7 Sonnet`, you can enable the model to \"think\" harder about a task, generating a chain-of-thought response before writing out the final answer.\n",
    "\n",
    "You can enable this by passing in the `thinking_dict` parameter to the constructor, specififying the amount of tokens to reserve for the thinking process."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "bbf4c90f",
   "metadata": {},
   "outputs": [],
   "source": [
    "from llama_index.llms.anthropic import Anthropic\n",
    "from llama_index.core.llms import ChatMessage\n",
    "\n",
    "llm = Anthropic(\n",
    "    model=\"claude-sonnet-4-0\",\n",
    "    # max_tokens must be greater than budget_tokens\n",
    "    max_tokens=64000,\n",
    "    # temperature must be 1.0 for thinking to work\n",
    "    temperature=1.0,\n",
    "    thinking_dict={\"type\": \"enabled\", \"budget_tokens\": 1600},\n",
    ")"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "94018d16",
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "I'll solve this step by step.\n",
      "\n",
      "First, let me calculate the numerator:\n",
      "1234 × 3421 = 4,221,514\n",
      "\n",
      "Next, let me calculate the denominator:\n",
      "231 + 2341 = 2,572\n",
      "\n",
      "Now I can divide:\n",
      "4,221,514 ÷ 2,572 = 1,641.42 (rounded to 2 decimal places)\n",
      "\n",
      "Therefore: (1234 × 3421) ÷ (231 + 2341) = **1,641.42**\n",
      "I'll solve this step by step.\n",
      "\n",
      "First, let me calculate the numerator:\n",
      "1234 × 3421 = 4,221,514\n",
      "\n",
      "Next, let me calculate the denominator:\n",
      "231 + 2341 = 2,572\n",
      "\n",
      "Now I can divide:\n",
      "4,221,514 ÷ 2,572 = 1,641.42 (rounded to 2 decimal places)\n",
      "\n",
      "Therefore: (1234 × 3421) ÷ (231 + 2341) = **1,641.42**\n"
     ]
    }
   ],
   "source": [
    "messages = [\n",
    "    ChatMessage(role=\"user\", content=\"(1234 * 3421) / (231 + 2341) = ?\")\n",
    "]\n",
    "\n",
    "resp_gen = llm.stream_chat(messages)\n",
    "\n",
    "for r in resp_gen:\n",
    "    print(r.delta, end=\"\")\n",
    "\n",
    "print()\n",
    "print(r.message.content)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "b573bbb6",
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "EsgICkYIAxgCKkBcW71ZZ3zt/vVxd0Aw2evRNOsyewVAaXXFcHa2zRC5O/TG/Db+RfgHqKNF7EWL0WuJKRXJZ20Y/vfuHrqMZPS3EgytQs6cFIcaRHzCK4UaDPxTA/o9OoluSRUt+CIw3ivbQzQiJz4rxrk+anAcYu8kGwe16Ig9vHtamNNJG0C3Ou2cy9bshhs5Wg0TFksMKq8HeFZ7D0AurTGciRSrsmwvNNnAG7JkmeNAFWED0AWlMXBwaX910DSs9dqGkd8ZKVnj5hH/+pVs6KjD0rdttpu62PEZS0TAT+1i0nWoydpSZRj2QByzway45NNBitBoZGnc60G+Hu+oE3Ju4JNthVpbGR9NI0YGk/+hkh+xF4LMoWJJn6sXufxo6z0nUGGmysxF/+v3SKo0BrZhA18yDV8LY327StLKaGj6/D4khxLWLbWVv2IboHq2jcKscBgmH91Rems3/Lo1ziNAq05Tg4l/Avx4l6WdiEF8g/lyT5XTGN+YxN8UOR7PqNVaQ9iHxMMVGk+mO9F3AwbP+9F7nhlrim9JPm4v4BhmGgzuKNVEFjm8woj3wmdc2Iw95RiNZY1TRAAt2bZXlwVUy+gINE4xtVjY1pPmEEmXlR1P4I2+Vjax05d4BfBsCE83aPxH/WYNYnZ95bOIz2DmAImEZXa3ineKz2JZWnWj5/T/lyEnaiB4bFqiSkUxOo+rWJrS2S8JGmyMg/zHpzHxdrrLxQQ0KmNICfDTJZDW8X0QR18DfuoVHZcOIkYgUM14GWngNcwmQeS2QiLbu+03husqHTFL27f4K8qyUn/E2hqtC/U2HNjEZFQhAsLNIunlQb4WnD6bhdx8s9DdD+rRWdyGcOe4rjW7syD/fxwrZQHDzXhbV40xA6BfHrg/vbbYfABd+fVopFzccFVfe/8rleKzLXn8OS6O+KE3rx5L+Rh0gfoc08RolFMzA/S+LGIkWN1I9JCxvsqjEcgAoYY+zkW44i6nNvbGFWuGWA/jK/D6vI1sjpO9YJ9XvGgfcXC9jmhpx86polTSIXYHOxkdNnTKTrvR/54DWKS6orSdA6+glA4ecC7ZnF+qpMkXDscSdH+Pnj+SEZMAYIFtRWT5tRhn6y1/XUxJtchWv2b3ZBQmsYCvs7YxoH3ryhSGG6SVtY9269o7a2zqwLtVMU5WqV9Jcq6NITW/RLJ7xGs/zX67u64ZaIto9VsSnRXIgaRY8HRybKB3ABXo27hiJadIecgvwO12+ds+x4O+su91gjCILOYSujTcjZ7EFXiaDOkIuFbEo6xg+wIcXxNtDcn7mQlqQ1s/JsARMxl7B9aQbz1BgQtJ6J49aIwMS03cAZrOgmmUHqy3b3ABQxMfMQOmvkm18kcypqNdW5JaXw3N0aKeboDT6StpduP+AL+fULF/uiqqxfZ+NgSEzGmqQQiPYX4djqH6O+M2cwam1BBYaDX/dGsKqRgB\n"
     ]
    }
   ],
   "source": [
    "print(r.message.additional_kwargs[\"thinking\"][\"signature\"])"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "6dc35e06",
   "metadata": {},
   "source": [
    "We can also expose the exact thinking process:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "ac32910e",
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "I need to calculate (1234 * 3421) / (231 + 2341).\n",
      "\n",
      "Let me do this step by step.\n",
      "\n",
      "First, let me calculate the numerator: 1234 * 3421\n",
      "1234 * 3421\n",
      "= 1234 * (3000 + 400 + 20 + 1)\n",
      "= 1234 * 3000 + 1234 * 400 + 1234 * 20 + 1234 * 1\n",
      "= 3,702,000 + 493,600 + 24,680 + 1,234\n",
      "= 4,221,514\n",
      "\n",
      "Now let me calculate the denominator: 231 + 2341\n",
      "231 + 2341 = 2,572\n",
      "\n",
      "So the expression becomes:\n",
      "4,221,514 / 2,572\n",
      "\n",
      "Let me do this division:\n",
      "4,221,514 ÷ 2,572 = 1,641.4...\n",
      "\n",
      "Let me be more precise:\n",
      "4,221,514 ÷ 2,572 = 1,641.42...\n",
      "\n",
      "Actually, let me double-check my multiplication:\n",
      "1234 * 3421\n",
      "\n",
      "1234\n",
      "×3421\n",
      "-----\n",
      "1234 (1234 × 1)\n",
      "2468 (1234 × 2, shifted one place)\n",
      "4936 (1234 × 4, shifted two places)\n",
      "3702 (1234 × 3, shifted three places)\n",
      "-----\n",
      "\n",
      "Let me be more careful:\n",
      "1234 × 1 = 1234\n",
      "1234 × 20 = 24680\n",
      "1234 × 400 = 493600\n",
      "1234 × 3000 = 3702000\n",
      "\n",
      "1234 + 24680 + 493600 + 3702000 = 4,221,514\n",
      "\n",
      "That's correct.\n",
      "\n",
      "Now 4,221,514 ÷ 2,572 ≈ 1,641.42\n"
     ]
    }
   ],
   "source": [
    "print(r.message.additional_kwargs[\"thinking\"][\"thinking\"])"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "b7b0bbcb",
   "metadata": {},
   "source": [
    "## Tool/Function Calling\n",
    "\n",
    "Anthropic supports direct tool/function calling through the API. Using LlamaIndex, we can implement some core agentic tool calling patterns."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "1f360774",
   "metadata": {},
   "outputs": [],
   "source": [
    "from llama_index.core.tools import FunctionTool\n",
    "from llama_index.core.llms import ChatMessage\n",
    "from llama_index.llms.anthropic import Anthropic\n",
    "from datetime import datetime\n",
    "\n",
    "llm = Anthropic(model=\"claude-sonnet-4-0\")\n",
    "\n",
    "\n",
    "def get_current_time() -> dict:\n",
    "    \"\"\"Get the current time\"\"\"\n",
    "    return {\"time\": datetime.now().strftime(\"%Y-%m-%d %H:%M:%S\")}\n",
    "\n",
    "\n",
    "# uses the tool name, any type annotations, and docstring to describe the tool\n",
    "tool = FunctionTool.from_defaults(fn=get_current_time)"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "306cf697",
   "metadata": {},
   "source": [
    "We can simply do a single pass to call the tool and get the result:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "a31eb615",
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "{'time': '2025-05-22 12:45:48'}\n"
     ]
    }
   ],
   "source": [
    "resp = llm.predict_and_call([tool], \"What is the current time?\")\n",
    "print(resp)"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "afd7718d",
   "metadata": {},
   "source": [
    "We can also use lower-level APIs to implement an agentic tool-calling loop!"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "59cb6fda",
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "Calling get_current_time with {}\n",
      "Tool output:  {'time': '2025-05-22 12:45:51'}\n",
      "Final response:  The current time is 12:45:51 on May 22, 2025.\n"
     ]
    }
   ],
   "source": [
    "chat_history = [ChatMessage(role=\"user\", content=\"What is the current time?\")]\n",
    "tools_by_name = {t.metadata.name: t for t in [tool]}\n",
    "\n",
    "resp = llm.chat_with_tools([tool], chat_history=chat_history)\n",
    "tool_calls = llm.get_tool_calls_from_response(\n",
    "    resp, error_on_no_tool_call=False\n",
    ")\n",
    "\n",
    "if not tool_calls:\n",
    "    print(resp)\n",
    "else:\n",
    "    while tool_calls:\n",
    "        # add the LLM's response to the chat history\n",
    "        chat_history.append(resp.message)\n",
    "\n",
    "        for tool_call in tool_calls:\n",
    "            tool_name = tool_call.tool_name\n",
    "            tool_kwargs = tool_call.tool_kwargs\n",
    "\n",
    "            print(f\"Calling {tool_name} with {tool_kwargs}\")\n",
    "            tool_output = tool.call(**tool_kwargs)\n",
    "            print(\"Tool output: \", tool_output)\n",
    "            chat_history.append(\n",
    "                ChatMessage(\n",
    "                    role=\"tool\",\n",
    "                    content=str(tool_output),\n",
    "                    # most LLMs like Anthropic, OpenAI, etc. need to know the tool call id\n",
    "                    additional_kwargs={\"tool_call_id\": tool_call.tool_id},\n",
    "                )\n",
    "            )\n",
    "\n",
    "            resp = llm.chat_with_tools([tool], chat_history=chat_history)\n",
    "            tool_calls = llm.get_tool_calls_from_response(\n",
    "                resp, error_on_no_tool_call=False\n",
    "            )\n",
    "    print(\"Final response: \", resp.message.content)"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "c0e72094",
   "metadata": {},
   "source": [
    "## Server-Side Tool Calling\n",
    "\n",
    "Anthropic now also supports server-side tool calling in latest versions. \n",
    "\n",
    "Here's an example of how to use it:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "3bc2b257",
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "Based on the latest research and industry reports, here are the key AI trends shaping 2025:\n",
      "\n",
      "## 1. Agentic AI Takes Center Stage\n",
      "\n",
      "Agentic AI - AI systems that can perform tasks independently with minimal human intervention - is emerging as the most significant trend for 2025. \"Think of agents as the apps of the AI era,\" according to Microsoft executives. Early implementations will focus on small, structured internal tasks like password changes or vacation requests, with companies being cautious about deploying agents for customer-facing activities involving real money.\n",
      "\n",
      "## 2. Advanced Reasoning Capabilities\n",
      "\n",
      "AI models with advanced reasoning capabilities, like OpenAI's o1, can solve complex problems with logical steps similar to human thinking, making them particularly useful in science, coding, math, law, and medicine. Tech companies are competing to develop frontier models that push boundaries in natural-language processing, image generation, and coding.\n",
      "\n",
      "## 3. Focus on Measurable ROI and Enterprise Adoption\n",
      "\n",
      "In 2025, businesses are pushing harder for measurable outcomes from generative AI: reduced costs, demonstrable ROI, and efficiency gains. Despite over 90% of organizations increasing their generative AI use, only 8% consider their initiatives mature, indicating significant room for growth in practical implementation.\n",
      "\n",
      "## 4. Scientific Discovery and Materials Science\n",
      "\n",
      "AI is increasingly being applied to scientific discovery, with materials science emerging as a promising area following AI's success in protein research. Meta has released massive datasets and models to help scientists discover new materials faster.\n",
      "\n",
      "## 5. Multimodal AI and Beyond Chatbots\n",
      "\n",
      "As AI technology matures, developers and businesses are looking beyond chatbots toward building sophisticated software applications on top of large language models rather than deploying chatbots as standalone tools.\n",
      "\n",
      "## 6. Dramatic Cost Reductions\n",
      "\n",
      "Inference costs are falling dramatically - from $20 per million tokens to $0.07 per million tokens in less than a year, with the cost for GPT-3.5-level performance dropping over 280-fold between November 2022 and October 2024.\n",
      "\n",
      "## 7. Closing Performance Gaps\n",
      "\n",
      "The performance gap between top U.S. and Chinese AI models has narrowed from 9.26% to just 1.70% in one year, while open-weight models are closing the gap with closed models, reducing the performance difference from 8% to 1.7% on some benchmarks.\n",
      "\n",
      "## 8. Increased Regulatory Activity\n",
      "\n",
      "U.S. federal agencies introduced 59 AI-related regulations in 2024 - more than double the number in 2023 - while globally, legislative mentions of AI rose 21.3% across 75 countries.\n",
      "\n",
      "## 9. Data Management Revolution\n",
      "\n",
      "Generative AI is making unstructured data important again, with 94% of data and AI leaders saying AI interest is leading to greater focus on data, driving a \"data lakehouse revolution\" that combines data lakes' flexibility with data warehouses' structure.\n",
      "\n",
      "## 10. Defense and Military Applications\n",
      "\n",
      "Defense-tech companies are capitalizing on classified military data to train AI models, with mainstream AI companies like OpenAI pivoting toward military partnerships, joining Microsoft, Amazon, and Google in working with the Pentagon.\n",
      "\n",
      "These trends indicate that 2025\n",
      "Source: https://news.microsoft.com/source/features/ai/6-ai-trends-youll-see-more-of-in-2025/ - · AI-powered agents will do more with greater autonomy and help simplify your life at home and on the job. \n",
      "Source: https://news.microsoft.com/source/features/ai/6-ai-trends-youll-see-more-of-in-2025/ - In 2025, a new generation of AI-powered agents will do more — even handling certain tasks on your behalf.  \n",
      "Source: https://sloanreview.mit.edu/article/five-trends-in-ai-and-data-science-for-2025/ - Let’s get agentic AI — the kind of AI that does tasks independently — out of the way first: It’s a sure bet for 2025’s “most trending AI trend.” Agent...\n",
      "Source: https://news.microsoft.com/source/features/ai/6-ai-trends-youll-see-more-of-in-2025/ - · “Think of agents as the apps of the AI era,” says Charles Lamanna, corporate vice president of business and industry Copilot.\n",
      "Source: https://sloanreview.mit.edu/article/five-trends-in-ai-and-data-science-for-2025/ - The earliest agents will be those for small, structured internal tasks with little money involved — for instance, helping change your password on the ...\n",
      "Source: https://news.microsoft.com/source/features/ai/6-ai-trends-youll-see-more-of-in-2025/ - Models with advanced reasoning capabilities, like OpenAI o1, can already solve complex problems with logical steps that are similar to how humans thin...\n",
      "Source: https://www.morganstanley.com/insights/articles/ai-trends-reasoning-frontier-models-2025-tmt - The world’s biggest tech companies are vying to refine cutting-edge uses for artificial intelligence utilizations: large language models’ ability to r...\n",
      "Source: https://www.techtarget.com/searchenterpriseai/tip/9-top-AI-and-machine-learning-trends - In 2025, expect businesses to push harder for measurable outcomes from generative AI: reduced costs, demonstrable ROI and efficiency gains. \n",
      "Source: https://www.techtarget.com/searchenterpriseai/tip/9-top-AI-and-machine-learning-trends - In 2025, expect businesses to push harder for measurable outcomes from generative AI: reduced costs, demonstrable ROI and efficiency gains. \n",
      "Source: https://www.techtarget.com/searchenterpriseai/tip/9-top-AI-and-machine-learning-trends - In a September 2024 research report, Informa TechTarget's Enterprise Strategy Group found that, although over 90% of organizations had increased their...\n",
      "Source: https://www.technologyreview.com/2025/01/08/1109188/whats-next-for-ai-in-2025/ - Expect this trend to continue next year, and to see more data sets and models that are aimed specifically at scientific discovery. \n",
      "Source: https://www.technologyreview.com/2025/01/08/1109188/whats-next-for-ai-in-2025/ - One potential area is materials science. Meta has released massive data sets and models that could help scientists use AI to discover new materials mu...\n",
      "Source: https://www.technologyreview.com/2025/01/08/1109188/whats-next-for-ai-in-2025/ - Meta has released massive data sets and models that could help scientists use AI to discover new materials much faster, and in December, Hugging Face,...\n",
      "Source: https://www.techtarget.com/searchenterpriseai/tip/9-top-AI-and-machine-learning-trends - But, as the technology matures, AI developers, end users and business customers alike are looking beyond chatbots. \"People need to think more creative...\n",
      "Source: https://spectrum.ieee.org/ai-index-2025 - That means inference costs, or the expense of querying a trained model, are falling dramatically. \n",
      "Source: https://spectrum.ieee.org/ai-index-2025 - The report notes that the blue line represents a drop from $20 per million tokens to $0.07 per million tokens; the pink line shows a drop from $15 per...\n",
      "Source: https://hai.stanford.edu/ai-index/2025-ai-index-report - Driven by increasingly capable small models, the inference cost for a system performing at the level of GPT-3.5 dropped over 280-fold between November...\n",
      "Source: https://spectrum.ieee.org/ai-index-2025 - In January 2024, the top U.S. model outperformed the best Chinese model by 9.26 percent; by February 2025, this gap had narrowed to just 1.70 percent....\n",
      "Source: https://hai.stanford.edu/ai-index/2025-ai-index-report - Open-weight models are also closing the gap with closed models, reducing the performance difference from 8% to just 1.7% on some benchmarks in a singl...\n",
      "Source: https://hai.stanford.edu/ai-index/2025-ai-index-report - In 2024, U.S. federal agencies introduced 59 AI-related regulations—more than double the number in 2023—and issued by twice as many agencies. Globally...\n",
      "Source: https://sloanreview.mit.edu/article/five-trends-in-ai-and-data-science-for-2025/ - Generative AI has had another impact on organizations: It’s making unstructured data important again. In the 2025 AI & Data Leadership Executive Bench...\n",
      "Source: https://www.morganstanley.com/insights/articles/ai-trends-reasoning-frontier-models-2025-tmt - Executives also highlighted the “data lakehouse revolution”—a trend to create unified data platforms that combine data lakes’ low-cost storage and fle...\n",
      "Source: https://www.technologyreview.com/2025/01/08/1109188/whats-next-for-ai-in-2025/ - In 2025, these trends will continue to be a boon for defense-tech companies like Palantir, Anduril, and others, which are now capitalizing on classifi...\n"
     ]
    }
   ],
   "source": [
    "from llama_index.llms.anthropic import Anthropic\n",
    "\n",
    "llm = Anthropic(\n",
    "    model=\"claude-sonnet-4-0\",\n",
    "    max_tokens=1024,\n",
    "    tools=[\n",
    "        {\n",
    "            \"type\": \"web_search_20250305\",\n",
    "            \"name\": \"web_search\",\n",
    "            \"max_uses\": 3,  # Limit to 3 searches\n",
    "        }\n",
    "    ],\n",
    ")\n",
    "\n",
    "# Get response with citations\n",
    "response = llm.complete(\"What are the latest AI research trends?\")\n",
    "\n",
    "# Access the main response content\n",
    "print(response.text)\n",
    "\n",
    "# Access citations if available\n",
    "for citation in response.citations:\n",
    "    print(f\"Source: {citation.get('url')} - {citation.get('cited_text')}\")"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "51788716",
   "metadata": {},
   "source": [
    "## Tool Calling + Citations\n",
    "\n",
    "In `llama-index-core>=0.12.46` + `llama-index-llms-anthropic>=0.7.6`, we've added support for outputting citable tool results!\n",
    "\n",
    "Using Anthropic, you can now utilize server-side citations to cite specific parts of your tool results.\n",
    "\n",
    "If the LLM cites a tool result, the citation will appear in the output as a `CitationBlock`, containing the source, title, and cited content.\n",
    "\n",
    "Let's cover a few ways to do this in practice.\n",
    "\n",
    "First, let's define a dummy tool/function that returns a citable block."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "7ba8804f",
   "metadata": {},
   "outputs": [],
   "source": [
    "from llama_index.core import Document\n",
    "from llama_index.core.llms import CitableBlock, TextBlock\n",
    "from llama_index.core.tools import FunctionTool\n",
    "\n",
    "dummy_text = Document.example().text\n",
    "\n",
    "\n",
    "async def search_fn(query: str):\n",
    "    \"\"\"Useful for searching the web to answer questions.\"\"\"\n",
    "    return CitableBlock(\n",
    "        content=[TextBlock(text=dummy_text)],\n",
    "        title=\"Facts about LLMs and LlamaIndex\",\n",
    "        source=\"https://docs.llamaindex.ai\",\n",
    "    )\n",
    "\n",
    "\n",
    "search_tool = FunctionTool.from_defaults(search_fn)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "92d90047",
   "metadata": {},
   "outputs": [],
   "source": [
    "from llama_index.llms.anthropic import Anthropic\n",
    "\n",
    "llm = Anthropic(\n",
    "    model=\"claude-sonnet-4-0\",\n",
    "    # api_key=\"sk-...\",\n",
    ")"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "f79e0b03",
   "metadata": {},
   "source": [
    "### Agents + Citable Tools\n",
    "\n",
    "You can also use these tools directly in pre-built agents, like the `FunctionAgent`, to get the same citations in the output."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "0b2c19c6",
   "metadata": {},
   "outputs": [],
   "source": [
    "from llama_index.core.agent.workflow import FunctionAgent\n",
    "\n",
    "agent = FunctionAgent(\n",
    "    tools=[search_tool],\n",
    "    llm=llm,\n",
    "    # Since we have a fake tool that returns a static result, we don't want to waste LLM tokens\n",
    "    system_prompt=\"Only make one search query per user message.\",\n",
    "    timeout=None,\n",
    ")"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "87ad1c0c",
   "metadata": {},
   "outputs": [],
   "source": [
    "output = await agent.run(\"How do LlamaIndex and LLMs work together?\")"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "b17453f1",
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "Based on the search results, I can explain how LlamaIndex and LLMs work together:\n",
      "\n",
      "\n",
      "LLMs are a phenomenal piece of technology for knowledge generation and reasoning. They are pre-trained on large amounts of publicly available data. However, there's a key challenge: How do we best augment LLMs with our own private data?\n",
      "\n",
      "This is where LlamaIndex comes in as the solution. LlamaIndex is a \"data framework\" to help you build LLM apps. Here's how they work together:\n",
      "\n",
      "## Data Integration and Structure\n",
      "LlamaIndex offers data connectors to ingest your existing data sources and data formats (APIs, PDFs, docs, SQL, etc.) and provides ways to structure your data (indices, graphs) so that this data can be easily used with LLMs.\n",
      "\n",
      "## Enhanced Query Interface\n",
      "LlamaIndex provides an advanced retrieval/query interface over your data: Feed in any LLM input prompt, get back retrieved context and knowledge-augmented output. This means when you ask a question, LlamaIndex retrieves relevant information from your private data and provides it as context to the LLM, enabling more accurate and personalized responses.\n",
      "\n",
      "## Flexible Integration\n",
      "LlamaIndex allows easy integrations with your outer application framework (e.g. with LangChain, Flask, Docker, ChatGPT, anything else).\n",
      "\n",
      "## User-Friendly Design\n",
      "LlamaIndex provides tools for both beginner users and advanced users. The high-level API allows beginner users to use LlamaIndex to ingest and query their data in 5 lines of code. The lower-level APIs allow advanced users to customize and extend any module (data connectors, indices, retrievers, query engines, reranking modules), to fit\n",
      "--------------------------------------------------------------------------------\n",
      "Source:  https://docs.llamaindex.ai\n",
      "Title:  Facts about LLMs and LlamaIndex\n",
      "Cited Content:\n",
      " \n",
      "Context\n",
      "LLMs are a phenomenal piece of technology for knowledge generation and reasoning.\n",
      "They are pre-trained on large amounts of publicly available data.\n",
      "How do we best augment LLMs with our own private data?\n",
      "We need a comprehensive toolkit to help perform this data augmentation for LLMs.\n",
      "\n",
      "Proposed Solution\n",
      "That's where LlamaIndex comes in. LlamaIndex is a \"data framework\" to help\n",
      "you build LLM  apps. It provides the following tools:\n",
      "\n",
      "Offers data connectors to ingest your existing data sources and data formats\n",
      "(APIs, PDFs, docs, SQL, etc.)\n",
      "Provides ways to structure your data (indices, graphs) so that this data can be\n",
      "easily used with LLMs.\n",
      "Provides an advanced retrieval/query interface over your data:\n",
      "Feed in any LLM input prompt, get back retrieved context and knowledge-augmented output.\n",
      "Allows easy integrations with your outer application framework\n",
      "(e.g. with LangChain, Flask, Docker, ChatGPT, anything else).\n",
      "LlamaIndex provides tools for both beginner users and advanced users.\n",
      "Our high-level API allows beginner users to use LlamaIndex to ingest and\n",
      "query their data in 5 lines of code. Our lower-level APIs allow advanced users to\n",
      "customize and extend any module (data connectors, indices, retrievers, query engines,\n",
      "reranking modules), to fit their needs.\n",
      "\n",
      "--------------------------------------------------------------------------------\n"
     ]
    }
   ],
   "source": [
    "from llama_index.core.llms import CitationBlock\n",
    "\n",
    "print(output.response.content)\n",
    "print(\"----\" * 20)\n",
    "for block in output.response.blocks:\n",
    "    if isinstance(block, CitationBlock):\n",
    "        print(\"Source: \", block.source)\n",
    "        print(\"Title: \", block.title)\n",
    "        print(\"Cited Content:\\n\", block.cited_content.text)\n",
    "        print(\"----\" * 20)"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "2b9a2c2b",
   "metadata": {},
   "source": [
    "### Manual Tool Calling + Citations\n",
    "\n",
    "Using our tool that returns a citable block, we can manually call the LLM with the given tool in a manual agent loop.\n",
    "\n",
    "Once the LLM stops making tool calls, we can return the final response and parse the citations from the response."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "4261960f",
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "Based on the search results, I can explain how LlamaIndex and LLMs work together:\n",
      "\n",
      "\n",
      "LlamaIndex is a \"data framework\" to help you build LLM apps\n",
      ". The integration works by addressing a key challenge: \n",
      "while LLMs are a phenomenal piece of technology for knowledge generation and reasoning and are pre-trained on large amounts of publicly available data, we need a comprehensive toolkit to help perform data augmentation for LLMs with our own private data\n",
      ".\n",
      "\n",
      "Here's how LlamaIndex and LLMs work together:\n",
      "\n",
      "## Data Integration\n",
      "\n",
      "LlamaIndex offers data connectors to ingest your existing data sources and data formats (APIs, PDFs, docs, SQL, etc.)\n",
      ", allowing you to bring your private data into a format that LLMs can work with.\n",
      "\n",
      "## Data Structuring\n",
      "\n",
      "LlamaIndex provides ways to structure your data (indices, graphs) so that this data can be easily used with LLMs\n",
      ". This structuring is crucial for making your data accessible and searchable by the LLM.\n",
      "\n",
      "## Enhanced Querying\n",
      "\n",
      "LlamaIndex provides an advanced retrieval/query interface over your data: Feed in any LLM input prompt, get back retrieved context and knowledge-augmented output\n",
      ". This means when you ask the LLM a question, LlamaIndex retrieves relevant information from your data and provides it as context to enhance the LLM's response.\n",
      "\n",
      "## Application Integration\n",
      "\n",
      "LlamaIndex allows easy integrations with your outer application framework (e.g. with LangChain, Flask, Docker, ChatGPT, anything else)\n",
      ", making it flexible to incorporate into existing systems.\n",
      "\n",
      "The framework is designed to be accessible to users at different levels: \n",
      "LlamaIndex's high-level API allows beginner users to use LlamaIndex to ingest and query their data in 5 lines of code, while\n",
      "--------------------------------------------------------------------------------\n",
      "Source:  https://docs.llamaindex.ai\n",
      "Title:  Facts about LLMs and LlamaIndex\n",
      "Cited Content:\n",
      " \n",
      "Context\n",
      "LLMs are a phenomenal piece of technology for knowledge generation and reasoning.\n",
      "They are pre-trained on large amounts of publicly available data.\n",
      "How do we best augment LLMs with our own private data?\n",
      "We need a comprehensive toolkit to help perform this data augmentation for LLMs.\n",
      "\n",
      "Proposed Solution\n",
      "That's where LlamaIndex comes in. LlamaIndex is a \"data framework\" to help\n",
      "you build LLM  apps. It provides the following tools:\n",
      "\n",
      "Offers data connectors to ingest your existing data sources and data formats\n",
      "(APIs, PDFs, docs, SQL, etc.)\n",
      "Provides ways to structure your data (indices, graphs) so that this data can be\n",
      "easily used with LLMs.\n",
      "Provides an advanced retrieval/query interface over your data:\n",
      "Feed in any LLM input prompt, get back retrieved context and knowledge-augmented output.\n",
      "Allows easy integrations with your outer application framework\n",
      "(e.g. with LangChain, Flask, Docker, ChatGPT, anything else).\n",
      "LlamaIndex provides tools for both beginner users and advanced users.\n",
      "Our high-level API allows beginner users to use LlamaIndex to ingest and\n",
      "query their data in 5 lines of code. Our lower-level APIs allow advanced users to\n",
      "customize and extend any module (data connectors, indices, retrievers, query engines,\n",
      "reranking modules), to fit their needs.\n",
      "\n",
      "--------------------------------------------------------------------------------\n"
     ]
    }
   ],
   "source": [
    "from llama_index.core.llms import ChatMessage, CitationBlock\n",
    "\n",
    "chat_history = [\n",
    "    ChatMessage(\n",
    "        role=\"system\",\n",
    "        # Since we have a fake tool that returns a static result, we don't want to waste LLM tokens\n",
    "        content=\"Only make one search query per user message.\",\n",
    "    ),\n",
    "    ChatMessage(\n",
    "        role=\"user\", content=\"How do LlamaIndex and LLMs work together?\"\n",
    "    ),\n",
    "]\n",
    "resp = llm.chat_with_tools([search_tool], chat_history=chat_history)\n",
    "chat_history.append(resp.message)\n",
    "\n",
    "tool_calls = llm.get_tool_calls_from_response(\n",
    "    resp, error_on_no_tool_call=False\n",
    ")\n",
    "while tool_calls:\n",
    "    for tool_call in tool_calls:\n",
    "        if tool_call.tool_name == \"search_fn\":\n",
    "            tool_result = search_tool.call(tool_call.tool_kwargs)\n",
    "            chat_history.append(\n",
    "                ChatMessage(\n",
    "                    role=\"tool\",\n",
    "                    blocks=tool_result.blocks,\n",
    "                    additional_kwargs={\"tool_call_id\": tool_call.tool_id},\n",
    "                )\n",
    "            )\n",
    "\n",
    "    resp = llm.chat_with_tools([search_tool], chat_history=chat_history)\n",
    "    chat_history.append(resp.message)\n",
    "    tool_calls = llm.get_tool_calls_from_response(\n",
    "        resp, error_on_no_tool_call=False\n",
    "    )\n",
    "\n",
    "print(resp.message.content)\n",
    "print(\"----\" * 20)\n",
    "for block in resp.message.blocks:\n",
    "    if isinstance(block, CitationBlock):\n",
    "        print(\"Source: \", block.source)\n",
    "        print(\"Title: \", block.title)\n",
    "        print(\"Cited Content:\\n\", block.cited_content.text)\n",
    "        print(\"----\" * 20)"
   ]
  }
 ],
 "metadata": {
  "kernelspec": {
   "display_name": ".venv",
   "language": "python",
   "name": "python3"
  },
  "language_info": {
   "codemirror_mode": {
    "name": "ipython",
    "version": 3
   },
   "file_extension": ".py",
   "mimetype": "text/x-python",
   "name": "python",
   "nbconvert_exporter": "python",
   "pygments_lexer": "ipython3"
  }
 },
 "nbformat": 4,
 "nbformat_minor": 5
}
